We’ve looked at a few different ways in which we can build models this week, including how to prepare them properly. This weekend we’ll build a multiple linear regression model on a dataset which will need some preparation. The data can be found in the data folder, along with a data dictionary
We want to investigate the avocado dataset, and, in particular, to model the AveragePrice of the avocados. Use the tools we’ve worked with this week in order to prepare your dataset and find appropriate predictors. Once you’ve built your model use the validation techniques discussed on Wednesday to evaluate it. Feel free to focus either on building an explanatory or a predictive model, or both if you are feeling energetic!
As part of the MVP we want you not to just run the code but also have a go at interpreting the results and write your thinking in comments in your script.
Hints and tips
region may lead to many dummy variables. Think carefully about whether to include this variable or not (there is no one ‘right’ answer to this!)Date will not be needed in your models, but can you extract any useful features out of Date before you discard it?leaps or glmulti to help with this.Here is what we found looking for information on the ‘avocado’ data. I am accepting this info as reliable.
“The table represents weekly retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailers’ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table.”
Relevant info for understanding ‘obscure’ variable names:
AveragePrice - the average price of a single avocado Region - the city or region of the observation, i.e. where avocados were sold. Total Volume - Total number of avocados sold 4046 - Total number of small avocados sold (PLU 4046) 4225 - Total number of medium avocados sold (PLU 4225) 4770 - Total number of large avocados sold (PLU 4770)
Apparently average price recorded here is not related to bag size so we can drop these variables. Also region doesn’t seem to have a direct relation with average price so it may be safe and beneficial to drop it too.
the x1 variable records the week in which sales were recorded in a 52 weeks per year format. Although our brief is not interested in time series and forecasting we can investigate if seasonality has an impact on average price. Avocados are very sensitive to variations in temperature so weather patterns may impact production and potentially prices. We have decided to keep only data for years 2015 - 2017 dropping partial 2018 data. This could help especially if seasons play some role on average price.
So, we’ll focus on ‘average price,’‘type’ and ‘total volume’. We’ll use ‘x1, ’date’ and ‘year’ to engineer variables which will enable us to explore seasonality.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(ggfortify)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(modelr)
avocado_df_exp <- read_csv("data/avocado.csv") %>%
clean_names() %>%
select(x1:x4770, type:year) %>%
rename(week = "x1",
small = "x4046",
medium = "x4225",
large = "x4770") %>%
filter(date <= "2017-12-31")
## Warning: Missing column names filled in: 'X1' [1]
##
## -- Column specification --------------------------------------------------------
## cols(
## X1 = col_double(),
## Date = col_date(format = ""),
## AveragePrice = col_double(),
## `Total Volume` = col_double(),
## `4046` = col_double(),
## `4225` = col_double(),
## `4770` = col_double(),
## `Total Bags` = col_double(),
## `Small Bags` = col_double(),
## `Large Bags` = col_double(),
## `XLarge Bags` = col_double(),
## type = col_character(),
## year = col_double(),
## region = col_character()
## )
avocado_tidy <- avocado_df_exp %>%
mutate(month = as.character(month(date))) %>%
mutate(season = case_when(
month == "12" | month == "1" | month == "2" ~ "winter",
month == "3" | month == "4" | month == "5" ~ "spring",
month == "6" | month == "7" | month == "8" ~ "summer",
month == "9" | month == "10" | month == "11" ~ "autumn")
) %>%
mutate(type = as.factor(type)) %>%
mutate(season = as.factor(season)) %>%
mutate(year = as.factor(year)) %>%
#mutate(week = as.factor(week))
select(-date)
We suspect ‘total volume’ to be strongly correlated to avocado sizes so we test and if so drop avocado sizes.
avocado_tidy %>%
select(total_volume:large) %>%
ggpairs()
avocado_tidy <- avocado_tidy %>%
select(-c(small, medium, large))
Let’s look at summary statistics
summary(avocado_tidy)
## week average_price total_volume type
## Min. : 0.00 Min. :0.44 Min. : 85 conventional:8478
## 1st Qu.:13.00 1st Qu.:1.10 1st Qu.: 10460 organic :8475
## Median :26.00 Median :1.37 Median : 104849
## Mean :25.66 Mean :1.41 Mean : 834110
## 3rd Qu.:39.00 3rd Qu.:1.67 3rd Qu.: 423186
## Max. :52.00 Max. :3.25 Max. :61034457
## year month season
## 2015:5615 Length:16953 autumn:4212
## 2016:5616 Class :character spring:4320
## 2017:5722 Mode :character summer:4210
## winter:4211
##
##
total volume is extremely skewed so this will affect our models. We need to look into this.
total_vol_by_type <- avocado_tidy %>%
group_by(type) %>%
summarise(avg_total_vol= mean(total_volume)) %>%
mutate(pct = prop.table(avg_total_vol) * 100)
## `summarise()` ungrouping output (override with `.groups` argument)
total_vol_by_type
More than 97 % of avocados in the data is conventional. It makes sense to focus on this type for average price modelling
avocado_tidy_conv <- avocado_tidy %>%
filter(type == "conventional") %>%
select(-type)
avocado_tidy_org <- avocado_tidy %>%
filter(type == "organic") %>%
select(-type)
both_types <- ggplot(avocado_tidy) +
aes(x = total_volume, y = average_price) +
geom_point(size = 1L, colour = "#0c4c8a") +
geom_smooth(span = 0.75) +
scale_x_continuous(trans = "log") +
scale_y_continuous(trans = "log") +
labs(title = "Average price decreases when Total Volume increseas") +
theme_minimal()
conventional <- ggplot(avocado_tidy_conv) +
aes(x = total_volume, y = average_price) +
geom_point(size = 1L, colour = "#0c4c8a") +
geom_smooth(span = 0.75) +
scale_x_continuous(trans = "log") +
scale_y_continuous(trans = "log") +
labs(title = "Average price decreases when Total Volume increseas") +
theme_minimal()
organic <- ggplot(avocado_tidy_org) +
aes(x = total_volume, y = average_price) +
geom_point(size = 1L, colour = "#0c4c8a") +
geom_smooth(span = 0.75) +
scale_x_continuous(trans = "log") +
scale_y_continuous(trans = "log") +
labs(title = "Average price decreases when Total Volume increseas") +
theme_minimal()
both_types
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
conventional
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
organic
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
ggplot(avocado_df_exp) +
aes(x = date, y = average_price, colour = type) +
geom_line(size = 1L) +
scale_color_hue() +
labs(title = "Average Price has a certain degree of seasonality") +
theme_minimal() +
facet_wrap(vars(type))
ggplot(avocado_df_exp) +
aes(x = type, y = average_price, fill = type) +
geom_boxplot() +
scale_fill_hue() +
labs(title = "As expected average price is higher for organic type") +
theme_minimal()
ggplot(avocado_df_exp) +
aes(x = date, weight = total_volume) +
geom_bar(fill = "#0c4c8a") +
labs(title = "Total Volume has also a pattern of seasonality") +
theme_minimal()
avocado_tidy %>%
ggpairs(aes(colour = type, alpha = 0.5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
avocado_tidy %>%
ggpairs(aes(colour = season, alpha = 0.5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
mod_type <- lm(log(average_price) ~ type, data = avocado_tidy)
mod_type
##
## Call:
## lm(formula = log(average_price) ~ type, data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic
## 0.1222 0.3591
summary(mod_type)
##
## Call:
## lm(formula = log(average_price) ~ type, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.30232 -0.13775 0.00724 0.15524 0.69732
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.122230 0.002517 48.57 <2e-16 ***
## typeorganic 0.359108 0.003559 100.89 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2317 on 16951 degrees of freedom
## Multiple R-squared: 0.3752, Adjusted R-squared: 0.3752
## F-statistic: 1.018e+04 on 1 and 16951 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type)
mod_total_volume <- lm(average_price ~ log(total_volume), data = avocado_tidy)
mod_total_volume
##
## Call:
## lm(formula = average_price ~ log(total_volume), data = avocado_tidy)
##
## Coefficients:
## (Intercept) log(total_volume)
## 2.5746 -0.1032
summary(mod_total_volume)
##
## Call:
## lm(formula = average_price ~ log(total_volume), data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.06658 -0.23668 -0.03644 0.19778 1.67839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.574617 0.012777 201.50 <2e-16 ***
## log(total_volume) -0.103156 0.001109 -92.99 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3327 on 16951 degrees of freedom
## Multiple R-squared: 0.3378, Adjusted R-squared: 0.3378
## F-statistic: 8647 on 1 and 16951 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_total_volume)
mod_month <- lm(average_price ~ month, data = avocado_tidy)
mod_month
##
## Call:
## lm(formula = average_price ~ month, data = avocado_tidy)
##
## Coefficients:
## (Intercept) month10 month11 month12 month2 month3
## 1.28919 0.29050 0.16638 0.04193 -0.02957 0.04178
## month4 month5 month6 month7 month8 month9
## 0.08519 0.05741 0.11978 0.17289 0.22333 0.28347
summary(mod_month)
##
## Call:
## lm(formula = average_price ~ month, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.99265 -0.30265 -0.03265 0.25444 1.79562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.28919 0.01018 126.589 < 2e-16 ***
## month10 0.29050 0.01440 20.170 < 2e-16 ***
## month11 0.16638 0.01468 11.336 < 2e-16 ***
## month12 0.04193 0.01468 2.856 0.00429 **
## month2 -0.02957 0.01499 -1.973 0.04854 *
## month3 0.04178 0.01468 2.846 0.00443 **
## month4 0.08519 0.01468 5.805 6.56e-09 ***
## month5 0.05741 0.01440 3.986 6.74e-05 ***
## month6 0.11978 0.01500 7.987 1.47e-15 ***
## month7 0.17289 0.01440 12.004 < 2e-16 ***
## month8 0.22333 0.01468 15.216 < 2e-16 ***
## month9 0.28347 0.01499 18.910 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.396 on 16941 degrees of freedom
## Multiple R-squared: 0.06224, Adjusted R-squared: 0.06163
## F-statistic: 102.2 on 11 and 16941 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_month)
mod_week <- lm(average_price ~ week, data = avocado_tidy)
mod_week
##
## Call:
## lm(formula = average_price ~ week, data = avocado_tidy)
##
## Coefficients:
## (Intercept) week
## 1.521376 -0.004322
summary(mod_week)
##
## Call:
## lm(formula = average_price ~ week, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03138 -0.30958 -0.03357 0.25855 1.80855
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5213759 0.0061104 248.98 <2e-16 ***
## week -0.0043223 0.0002052 -21.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4036 on 16951 degrees of freedom
## Multiple R-squared: 0.02551, Adjusted R-squared: 0.02545
## F-statistic: 443.8 on 1 and 16951 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_week)
mod_season <- lm(average_price ~ season, data = avocado_tidy)
mod_season
##
## Call:
## lm(formula = average_price ~ season, data = avocado_tidy)
##
## Coefficients:
## (Intercept) seasonspring seasonsummer seasonwinter
## 1.53615 -0.18560 -0.07357 -0.24209
summary(mod_season)
##
## Call:
## lm(formula = average_price ~ season, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.95615 -0.30405 -0.03257 0.25595 1.81945
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.536147 0.006130 250.602 <2e-16 ***
## seasonspring -0.185600 0.008615 -21.545 <2e-16 ***
## seasonsummer -0.073574 0.008670 -8.486 <2e-16 ***
## seasonwinter -0.242093 0.008669 -27.925 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3978 on 16949 degrees of freedom
## Multiple R-squared: 0.05314, Adjusted R-squared: 0.05297
## F-statistic: 317.1 on 3 and 16949 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_season)
mod_year <- lm(average_price ~ year, data = avocado_tidy)
mod_year
##
## Call:
## lm(formula = average_price ~ year, data = avocado_tidy)
##
## Coefficients:
## (Intercept) year2016 year2017
## 1.37559 -0.03695 0.13954
summary(mod_year)
##
## Call:
## lm(formula = average_price ~ year, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.07513 -0.29864 -0.03864 0.25487 1.91136
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.375590 0.005360 256.632 < 2e-16 ***
## year2016 -0.036951 0.007580 -4.875 1.1e-06 ***
## year2017 0.139537 0.007545 18.494 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4017 on 16950 degrees of freedom
## Multiple R-squared: 0.03476, Adjusted R-squared: 0.03465
## F-statistic: 305.2 on 2 and 16950 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_year)
remaining_resid <- avocado_tidy %>%
add_residuals(mod_type) %>%
select(-c(average_price, type))
remaining_resid %>%
ggpairs(aes(colour = season, alpha = 0.5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
mod_type_week <- lm(average_price ~ type + week, data = avocado_tidy)
mod_type_week
##
## Call:
## lm(formula = average_price ~ type + week, data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic week
## 1.271166 0.500254 -0.004317
summary(mod_type_week)
##
## Call:
## lm(formula = average_price ~ type + week, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.14577 -0.20821 -0.02663 0.19131 1.55832
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.271166 0.005377 236.40 <2e-16 ***
## typeorganic 0.500254 0.004865 102.83 <2e-16 ***
## week -0.004317 0.000161 -26.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3167 on 16950 degrees of freedom
## Multiple R-squared: 0.3999, Adjusted R-squared: 0.3998
## F-statistic: 5648 on 2 and 16950 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_week)
mod_type_total_volume <- lm(average_price ~ type + log(total_volume), data = avocado_tidy)
mod_type_total_volume
##
## Call:
## lm(formula = average_price ~ type + log(total_volume), data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic log(total_volume)
## 1.75552 0.33349 -0.04535
summary(mod_type_total_volume)
##
## Call:
## lm(formula = average_price ~ type + log(total_volume), data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.14711 -0.20784 -0.02665 0.18296 1.60193
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.755519 0.023313 75.30 <2e-16 ***
## typeorganic 0.333487 0.008093 41.21 <2e-16 ***
## log(total_volume) -0.045349 0.001757 -25.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3172 on 16950 degrees of freedom
## Multiple R-squared: 0.3981, Adjusted R-squared: 0.398
## F-statistic: 5606 on 2 and 16950 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_total_volume)
mod_type_year <- lm(average_price ~ type + year, data = avocado_tidy)
mod_type_year
##
## Call:
## lm(formula = average_price ~ type + year, data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic year2016 year2017
## 1.1255 0.5003 -0.0370 0.1396
summary(mod_type_year)
##
## Call:
## lm(formula = average_price ~ type + year, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.32537 -0.18880 -0.01548 0.18463 1.66120
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.125478 0.004838 232.624 < 2e-16 ***
## typeorganic 0.500313 0.004827 103.653 < 2e-16 ***
## year2016 -0.036995 0.005930 -6.238 4.53e-10 ***
## year2017 0.139580 0.005903 23.647 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3142 on 16949 degrees of freedom
## Multiple R-squared: 0.4092, Adjusted R-squared: 0.4091
## F-statistic: 3914 on 3 and 16949 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_year)
mod_type_month <- lm(average_price ~ type + month, data = avocado_tidy)
mod_type_month
##
## Call:
## lm(formula = average_price ~ type + month, data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic month10 month11 month12 month2
## 1.03904 0.50028 0.29050 0.16638 0.04210 -0.02957
## month3 month4 month5 month6 month7 month8
## 0.04178 0.08519 0.05741 0.12016 0.17289 0.22333
## month9
## 0.28347
summary(mod_type_month)
##
## Call:
## lm(formula = average_price ~ type + month, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.14982 -0.20115 -0.02194 0.19046 1.54548
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.039045 0.008238 126.129 < 2e-16 ***
## typeorganic 0.500283 0.004715 106.112 < 2e-16 ***
## month10 0.290496 0.011163 26.023 < 2e-16 ***
## month11 0.166376 0.011376 14.626 < 2e-16 ***
## month12 0.042104 0.011378 3.701 0.000216 ***
## month2 -0.029572 0.011619 -2.545 0.010930 *
## month3 0.041775 0.011376 3.672 0.000241 ***
## month4 0.085194 0.011376 7.489 7.27e-14 ***
## month5 0.057414 0.011163 5.143 2.73e-07 ***
## month6 0.120165 0.011624 10.338 < 2e-16 ***
## month7 0.172890 0.011163 15.488 < 2e-16 ***
## month8 0.223328 0.011376 19.632 < 2e-16 ***
## month9 0.283468 0.011619 24.397 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3069 on 16940 degrees of freedom
## Multiple R-squared: 0.4367, Adjusted R-squared: 0.4363
## F-statistic: 1094 on 12 and 16940 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_month)
anova(mod_type, mod_type_month)
## Warning in anova.lmlist(object, ...): models with response '"average_price"'
## removed because response differs from model 1
mod_type_season <- lm(average_price ~ type + season, data = avocado_tidy)
mod_type_season
##
## Call:
## lm(formula = average_price ~ type + season, data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic seasonspring seasonsummer seasonwinter
## 1.28600 0.50029 -0.18560 -0.07346 -0.24203
summary(mod_type_season)
##
## Call:
## lm(formula = average_price ~ type + season, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.16069 -0.20284 -0.02284 0.18931 1.56931
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.286001 0.005325 241.48 <2e-16 ***
## typeorganic 0.500291 0.004751 105.29 <2e-16 ***
## seasonspring -0.185600 0.006698 -27.71 <2e-16 ***
## seasonsummer -0.073455 0.006741 -10.90 <2e-16 ***
## seasonwinter -0.242034 0.006741 -35.91 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3093 on 16948 degrees of freedom
## Multiple R-squared: 0.4276, Adjusted R-squared: 0.4274
## F-statistic: 3165 on 4 and 16948 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_season)
remaining_resid <- avocado_tidy %>%
add_residuals(mod_month) %>%
select(-c(average_price, type, month))
remaining_resid %>%
ggpairs(aes(colour = season, alpha = 0.5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
mod_type_month_total_volume <- lm(average_price ~ type + month + log(total_volume), data = avocado_tidy)
mod_type_month_total_volume
##
## Call:
## lm(formula = average_price ~ type + month + log(total_volume),
## data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic month10 month11
## 1.60846 0.34000 0.28755 0.16171
## month12 month2 month3 month4
## 0.04255 -0.02532 0.04463 0.09181
## month5 month6 month7 month8
## 0.06661 0.12723 0.17821 0.22518
## month9 log(total_volume)
## 0.28350 -0.04358
summary(mod_type_month_total_volume)
##
## Call:
## lm(formula = average_price ~ type + month + log(total_volume),
## data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.13248 -0.19748 -0.01783 0.17872 1.47888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.608457 0.023279 69.096 < 2e-16 ***
## typeorganic 0.340002 0.007690 44.213 < 2e-16 ***
## month10 0.287554 0.010946 26.269 < 2e-16 ***
## month11 0.161713 0.011156 14.496 < 2e-16 ***
## month12 0.042553 0.011156 3.814 0.000137 ***
## month2 -0.025317 0.011394 -2.222 0.026299 *
## month3 0.044628 0.011155 4.001 6.34e-05 ***
## month4 0.091807 0.011157 8.228 < 2e-16 ***
## month5 0.066609 0.010951 6.082 1.21e-09 ***
## month6 0.127226 0.011401 11.160 < 2e-16 ***
## month7 0.178207 0.010948 16.278 < 2e-16 ***
## month8 0.225184 0.011154 20.188 < 2e-16 ***
## month9 0.283504 0.011393 24.885 < 2e-16 ***
## log(total_volume) -0.043575 0.001671 -26.081 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.301 on 16939 degrees of freedom
## Multiple R-squared: 0.4584, Adjusted R-squared: 0.458
## F-statistic: 1103 on 13 and 16939 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_month_total_volume)
mod_type_month_season <- lm(average_price ~ type + month + season, data = avocado_tidy)
mod_type_month_season
##
## Call:
## lm(formula = average_price ~ type + month + season, data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic month10 month11 month12
## 1.03904 0.50028 0.29050 0.16638 0.04210
## month2 month3 month4 month5 month6
## -0.02957 0.04178 0.08519 0.05741 0.12016
## month7 month8 month9 seasonspring seasonsummer
## 0.17289 0.22333 0.28347 NA NA
## seasonwinter
## NA
summary(mod_type_month_season)
##
## Call:
## lm(formula = average_price ~ type + month + season, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.14982 -0.20115 -0.02194 0.19046 1.54548
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.039045 0.008238 126.129 < 2e-16 ***
## typeorganic 0.500283 0.004715 106.112 < 2e-16 ***
## month10 0.290496 0.011163 26.023 < 2e-16 ***
## month11 0.166376 0.011376 14.626 < 2e-16 ***
## month12 0.042104 0.011378 3.701 0.000216 ***
## month2 -0.029572 0.011619 -2.545 0.010930 *
## month3 0.041775 0.011376 3.672 0.000241 ***
## month4 0.085194 0.011376 7.489 7.27e-14 ***
## month5 0.057414 0.011163 5.143 2.73e-07 ***
## month6 0.120165 0.011624 10.338 < 2e-16 ***
## month7 0.172890 0.011163 15.488 < 2e-16 ***
## month8 0.223328 0.011376 19.632 < 2e-16 ***
## month9 0.283468 0.011619 24.397 < 2e-16 ***
## seasonspring NA NA NA NA
## seasonsummer NA NA NA NA
## seasonwinter NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3069 on 16940 degrees of freedom
## Multiple R-squared: 0.4367, Adjusted R-squared: 0.4363
## F-statistic: 1094 on 12 and 16940 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_month_season)
mod_type_month_year <- lm(average_price ~ type + month + year, data = avocado_tidy)
mod_type_month_year
##
## Call:
## lm(formula = average_price ~ type + month + year, data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic month10 month11 month12 month2
## 1.00211 0.50030 0.29050 0.17149 0.03636 -0.02711
## month3 month4 month5 month6 month7 month8
## 0.04689 0.07948 0.06746 0.12279 0.17289 0.22844
## month9 year2016 year2017
## 0.28593 -0.03730 0.14070
summary(mod_type_month_year)
##
## Call:
## lm(formula = average_price ~ type + month + year, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.25000 -0.18917 -0.01311 0.18043 1.49439
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.002107 0.008701 115.174 < 2e-16 ***
## typeorganic 0.500303 0.004565 109.594 < 2e-16 ***
## month10 0.290496 0.010809 26.876 < 2e-16 ***
## month11 0.171489 0.011025 15.554 < 2e-16 ***
## month12 0.036364 0.011019 3.300 0.000969 ***
## month2 -0.027110 0.011253 -2.409 0.015996 *
## month3 0.046888 0.011025 4.253 2.12e-05 ***
## month4 0.079484 0.011017 7.214 5.65e-13 ***
## month5 0.067464 0.010816 6.237 4.56e-10 ***
## month6 0.122791 0.011257 10.908 < 2e-16 ***
## month7 0.172890 0.010809 15.995 < 2e-16 ***
## month8 0.228441 0.011025 20.720 < 2e-16 ***
## month9 0.285930 0.011253 25.410 < 2e-16 ***
## year2016 -0.037298 0.005621 -6.636 3.32e-11 ***
## year2017 0.140698 0.005600 25.122 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2972 on 16938 degrees of freedom
## Multiple R-squared: 0.4719, Adjusted R-squared: 0.4715
## F-statistic: 1081 on 14 and 16938 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_month_year)
remaining_resid <- avocado_tidy %>%
add_residuals(mod_year) %>%
select(-c(average_price, type, month, year))
remaining_resid %>%
ggpairs(aes(colour = season, alpha = 0.5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
mod_type_month_year_total_volume <- lm(average_price ~ type + month + year + total_volume, data = avocado_tidy)
mod_type_month_year_total_volume
##
## Call:
## lm(formula = average_price ~ type + month + year + total_volume,
## data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic month10 month11 month12
## 1.011e+00 4.912e-01 2.894e-01 1.704e-01 3.578e-02
## month2 month3 month4 month5 month6
## -2.653e-02 4.667e-02 7.951e-02 6.805e-02 1.231e-01
## month7 month8 month9 year2016 year2017
## 1.728e-01 2.281e-01 2.852e-01 -3.686e-02 1.412e-01
## total_volume
## -5.769e-09
summary(mod_type_month_year_total_volume)
##
## Call:
## lm(formula = average_price ~ type + month + year + total_volume,
## data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.25008 -0.18825 -0.01059 0.17964 1.49500
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.011e+00 8.755e-03 115.526 < 2e-16 ***
## typeorganic 4.912e-01 4.685e-03 104.844 < 2e-16 ***
## month10 2.894e-01 1.079e-02 26.822 < 2e-16 ***
## month11 1.704e-01 1.100e-02 15.485 < 2e-16 ***
## month12 3.578e-02 1.100e-02 3.253 0.00114 **
## month2 -2.653e-02 1.123e-02 -2.362 0.01817 *
## month3 4.667e-02 1.100e-02 4.242 2.23e-05 ***
## month4 7.951e-02 1.100e-02 7.231 4.99e-13 ***
## month5 6.805e-02 1.079e-02 6.304 2.98e-10 ***
## month6 1.231e-01 1.123e-02 10.957 < 2e-16 ***
## month7 1.728e-01 1.079e-02 16.018 < 2e-16 ***
## month8 2.281e-01 1.100e-02 20.727 < 2e-16 ***
## month9 2.852e-01 1.123e-02 25.399 < 2e-16 ***
## year2016 -3.686e-02 5.610e-03 -6.571 5.15e-11 ***
## year2017 1.412e-01 5.590e-03 25.257 < 2e-16 ***
## total_volume -5.769e-09 6.932e-10 -8.323 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2966 on 16937 degrees of freedom
## Multiple R-squared: 0.4741, Adjusted R-squared: 0.4736
## F-statistic: 1018 on 15 and 16937 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_month_year_total_volume)
mod_type_month_year_week <- lm(average_price ~ type + month + year + week, data = avocado_tidy)
mod_type_month_year_week
##
## Call:
## lm(formula = average_price ~ type + month + year + week, data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic month10 month11 month12 month2
## 1.398290 0.500213 -0.022093 -0.177098 -0.347067 -0.061808
## month3 month4 month5 month6 month7 month8
## -0.021168 -0.023442 -0.071366 -0.050825 -0.035485 -0.015932
## month9 year2016 year2017 week
## 0.008115 -0.039921 0.145008 -0.008016
summary(mod_type_month_year_week)
##
## Call:
## lm(formula = average_price ~ type + month + year + week, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.23763 -0.18814 -0.01214 0.18029 1.49463
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.398290 0.089820 15.568 < 2e-16 ***
## typeorganic 0.500213 0.004563 109.633 < 2e-16 ***
## month10 -0.022093 0.071357 -0.310 0.7569
## month11 -0.177098 0.079426 -2.230 0.0258 *
## month12 -0.347067 0.087218 -3.979 6.94e-05 ***
## month2 -0.061808 0.013703 -4.510 6.51e-06 ***
## month3 -0.021168 0.018901 -1.120 0.2628
## month4 -0.023442 0.025703 -0.912 0.3618
## month5 -0.071366 0.033139 -2.154 0.0313 *
## month6 -0.050825 0.040759 -1.247 0.2124
## month7 -0.035485 0.048244 -0.736 0.4620
## month8 -0.015932 0.056232 -0.283 0.7769
## month9 0.008115 0.063689 0.127 0.8986
## year2016 -0.039921 0.005649 -7.067 1.64e-12 ***
## year2017 0.145008 0.005681 25.524 < 2e-16 ***
## week -0.008016 0.001809 -4.432 9.41e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.297 on 16937 degrees of freedom
## Multiple R-squared: 0.4725, Adjusted R-squared: 0.4721
## F-statistic: 1012 on 15 and 16937 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_month_year_week)
mod_type_month_year_season <- lm(average_price ~ type + month + year + season, data = avocado_tidy)
mod_type_month_year_season
##
## Call:
## lm(formula = average_price ~ type + month + year + season, data = avocado_tidy)
##
## Coefficients:
## (Intercept) typeorganic month10 month11 month12
## 1.00211 0.50030 0.29050 0.17149 0.03636
## month2 month3 month4 month5 month6
## -0.02711 0.04689 0.07948 0.06746 0.12279
## month7 month8 month9 year2016 year2017
## 0.17289 0.22844 0.28593 -0.03730 0.14070
## seasonspring seasonsummer seasonwinter
## NA NA NA
summary(mod_type_month_year_season)
##
## Call:
## lm(formula = average_price ~ type + month + year + season, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.25000 -0.18917 -0.01311 0.18043 1.49439
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.002107 0.008701 115.174 < 2e-16 ***
## typeorganic 0.500303 0.004565 109.594 < 2e-16 ***
## month10 0.290496 0.010809 26.876 < 2e-16 ***
## month11 0.171489 0.011025 15.554 < 2e-16 ***
## month12 0.036364 0.011019 3.300 0.000969 ***
## month2 -0.027110 0.011253 -2.409 0.015996 *
## month3 0.046888 0.011025 4.253 2.12e-05 ***
## month4 0.079484 0.011017 7.214 5.65e-13 ***
## month5 0.067464 0.010816 6.237 4.56e-10 ***
## month6 0.122791 0.011257 10.908 < 2e-16 ***
## month7 0.172890 0.010809 15.995 < 2e-16 ***
## month8 0.228441 0.011025 20.720 < 2e-16 ***
## month9 0.285930 0.011253 25.410 < 2e-16 ***
## year2016 -0.037298 0.005621 -6.636 3.32e-11 ***
## year2017 0.140698 0.005600 25.122 < 2e-16 ***
## seasonspring NA NA NA NA
## seasonsummer NA NA NA NA
## seasonwinter NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2972 on 16938 degrees of freedom
## Multiple R-squared: 0.4719, Adjusted R-squared: 0.4715
## F-statistic: 1081 on 14 and 16938 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(mod_type_month_year_season)
average_price_residual <- avocado_tidy %>%
add_residuals(mod_type_month_year_total_volume) %>%
select(-average_price)
coplot(resid ~ log(total_volume) | month,
panel = function(x, y, ...){
points(x, y)
abline(lm(y ~ x), col = "blue")
},
data = average_price_residual, columns=6)
average_price_residual %>%
ggplot(aes(x = log(total_volume), y = resid, colour = type)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
mod_int_t1 <- lm(average_price ~ type + month + year + total_volume + type:month, data = avocado_tidy)
summary(mod_int_t1)
##
## Call:
## lm(formula = average_price ~ type + month + year + total_volume +
## type:month, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20608 -0.18751 -0.01144 0.17733 1.51395
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.010e+00 1.137e-02 88.794 < 2e-16 ***
## typeorganic 4.948e-01 1.527e-02 32.399 < 2e-16 ***
## month10 3.101e-01 1.523e-02 20.360 < 2e-16 ***
## month11 1.720e-01 1.553e-02 11.076 < 2e-16 ***
## month12 3.353e-02 1.552e-02 2.160 0.0308 *
## month2 -3.410e-02 1.585e-02 -2.151 0.0315 *
## month3 9.247e-02 1.553e-02 5.956 2.64e-09 ***
## month4 9.964e-02 1.552e-02 6.420 1.40e-10 ***
## month5 6.373e-02 1.523e-02 4.183 2.89e-05 ***
## month6 1.153e-01 1.585e-02 7.271 3.73e-13 ***
## month7 1.753e-01 1.523e-02 11.510 < 2e-16 ***
## month8 2.027e-01 1.553e-02 13.057 < 2e-16 ***
## month9 2.588e-01 1.585e-02 16.326 < 2e-16 ***
## year2016 -3.686e-02 5.600e-03 -6.582 4.76e-11 ***
## year2017 1.412e-01 5.580e-03 25.301 < 2e-16 ***
## total_volume -5.749e-09 6.923e-10 -8.305 < 2e-16 ***
## typeorganic:month10 -4.151e-02 2.154e-02 -1.927 0.0540 .
## typeorganic:month11 -3.191e-03 2.195e-02 -0.145 0.8844
## typeorganic:month12 4.502e-03 2.195e-02 0.205 0.8375
## typeorganic:month2 1.513e-02 2.242e-02 0.675 0.4997
## typeorganic:month3 -9.159e-02 2.195e-02 -4.173 3.02e-05 ***
## typeorganic:month4 -4.027e-02 2.195e-02 -1.835 0.0665 .
## typeorganic:month5 8.631e-03 2.154e-02 0.401 0.6886
## typeorganic:month6 1.572e-02 2.243e-02 0.701 0.4834
## typeorganic:month7 -5.002e-03 2.154e-02 -0.232 0.8164
## typeorganic:month8 5.065e-02 2.195e-02 2.308 0.0210 *
## typeorganic:month9 5.282e-02 2.242e-02 2.356 0.0185 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2961 on 16926 degrees of freedom
## Multiple R-squared: 0.4762, Adjusted R-squared: 0.4754
## F-statistic: 591.9 on 26 and 16926 DF, p-value: < 2.2e-16
mod_int_t2 <- lm(average_price ~ type + month + year + total_volume + month:year, data = avocado_tidy)
summary(mod_int_t2)
##
## Call:
## lm(formula = average_price ~ type + month + year + total_volume +
## month:year, data = avocado_tidy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20897 -0.17156 -0.01063 0.16921 1.44350
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.123e+00 1.400e-02 80.230 < 2e-16 ***
## typeorganic 4.917e-01 4.527e-03 108.630 < 2e-16 ***
## month10 2.677e-02 1.950e-02 1.373 0.169853
## month11 -3.472e-02 1.850e-02 -1.877 0.060550 .
## month12 -5.950e-02 1.951e-02 -3.050 0.002294 **
## month2 -3.754e-02 1.950e-02 -1.925 0.054207 .
## month3 -2.854e-03 1.850e-02 -0.154 0.877401
## month4 1.873e-02 1.950e-02 0.961 0.336758
## month5 -1.949e-02 1.850e-02 -1.054 0.291990
## month6 3.483e-02 1.950e-02 1.786 0.074076 .
## month7 4.488e-02 1.950e-02 2.302 0.021353 *
## month8 7.965e-02 1.850e-02 4.306 1.67e-05 ***
## month9 8.424e-02 1.950e-02 4.320 1.57e-05 ***
## year2016 -1.241e-01 1.850e-02 -6.708 2.04e-11 ***
## year2017 -8.618e-02 1.850e-02 -4.659 3.21e-06 ***
## total_volume -5.437e-09 6.698e-10 -8.116 5.13e-16 ***
## month10:year2016 2.890e-01 2.616e-02 11.047 < 2e-16 ***
## month11:year2016 3.430e-01 2.616e-02 13.113 < 2e-16 ***
## month12:year2016 1.347e-01 2.689e-02 5.010 5.50e-07 ***
## month2:year2016 3.507e-02 2.688e-02 1.305 0.191959
## month3:year2016 -1.298e-02 2.616e-02 -0.496 0.619735
## month4:year2016 -5.362e-02 2.688e-02 -1.995 0.046049 *
## month5:year2016 -2.011e-02 2.542e-02 -0.791 0.429052
## month6:year2016 8.418e-03 2.688e-02 0.313 0.754129
## month7:year2016 1.162e-01 2.616e-02 4.441 9.00e-06 ***
## month8:year2016 9.115e-02 2.616e-02 3.484 0.000494 ***
## month9:year2016 1.032e-01 2.688e-02 3.840 0.000123 ***
## month10:year2017 4.465e-01 2.616e-02 17.066 < 2e-16 ***
## month11:year2017 2.732e-01 2.616e-02 10.444 < 2e-16 ***
## month12:year2017 1.451e-01 2.617e-02 5.545 2.98e-08 ***
## month2:year2017 -2.460e-02 2.688e-02 -0.915 0.359991
## month3:year2017 1.234e-01 2.616e-02 4.718 2.40e-06 ***
## month4:year2017 2.059e-01 2.616e-02 7.872 3.69e-15 ***
## month5:year2017 2.746e-01 2.616e-02 10.496 < 2e-16 ***
## month6:year2017 2.340e-01 2.689e-02 8.702 < 2e-16 ***
## month7:year2017 2.420e-01 2.616e-02 9.249 < 2e-16 ***
## month8:year2017 3.407e-01 2.616e-02 13.023 < 2e-16 ***
## month9:year2017 4.774e-01 2.688e-02 17.763 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2866 on 16915 degrees of freedom
## Multiple R-squared: 0.5097, Adjusted R-squared: 0.5086
## F-statistic: 475.2 on 37 and 16915 DF, p-value: < 2.2e-16
relaimpo::calc.relimp(mod_type_month_year_total_volume, type = "lmg", rela = TRUE)
## Response variable: average_price
## Total response variance: 0.1671173
## Analysis based on 16953 observations
##
## 15 Regressors:
## Some regressors combined in groups:
## Group month : month10 month11 month12 month2 month3 month4 month5 month6 month7 month8 month9
## Group year : year2016 year2017
##
## Relative importance of 4 (groups of) regressors assessed:
## month year type total_volume
##
## Proportion of variance explained by model: 47.41%
## Metrics are normalized to sum to 100% (rela=TRUE).
##
## Relative importance metrics:
##
## lmg
## month 0.13078681
## year 0.07399211
## type 0.75456058
## total_volume 0.04066050
##
## Average coefficients for different model sizes:
##
## 1group 2groups 3groups 4groups
## type 5.002927e-01 4.969983e-01 4.939747e-01 4.912087e-01
## month10 2.904960e-01 2.890097e-01 2.886308e-01 2.893588e-01
## month11 1.663762e-01 1.665884e-01 1.679718e-01 1.703928e-01
## month12 4.192540e-02 3.929673e-02 3.725483e-02 3.577504e-02
## month2 -2.957231e-02 -2.802169e-02 -2.698635e-02 -2.653008e-02
## month3 4.177503e-02 4.313927e-02 4.481640e-02 4.667318e-02
## month4 8.519383e-02 8.331178e-02 8.142483e-02 7.950803e-02
## month5 5.741402e-02 6.148067e-02 6.505841e-02 6.804710e-02
## month6 1.197779e-01 1.211728e-01 1.223033e-01 1.231046e-01
## month7 1.728902e-01 1.727509e-01 1.727154e-01 1.727836e-01
## month8 2.233277e-01 2.244754e-01 2.260974e-01 2.280602e-01
## month9 2.834678e-01 2.833520e-01 2.839625e-01 2.852349e-01
## year2016 -3.695078e-02 -3.646617e-02 -3.644299e-02 -3.685972e-02
## year2017 1.395372e-01 1.405557e-01 1.411118e-01 1.411752e-01
## total_volume -2.318360e-08 -1.739063e-08 -1.158561e-08 -5.769023e-09